284 research outputs found

    K-nearest Neighbor Search by Random Projection Forests

    Full text link
    K-nearest neighbor (kNN) search has wide applications in many areas, including data mining, machine learning, statistics and many applied domains. Inspired by the success of ensemble methods and the flexibility of tree-based methodology, we propose random projection forests (rpForests), for kNN search. rpForests finds kNNs by aggregating results from an ensemble of random projection trees with each constructed recursively through a series of carefully chosen random projections. rpForests achieves a remarkable accuracy in terms of fast decay in the missing rate of kNNs and that of discrepancy in the kNN distances. rpForests has a very low computational complexity. The ensemble nature of rpForests makes it easily run in parallel on multicore or clustered computers; the running time is expected to be nearly inversely proportional to the number of cores or machines. We give theoretical insights by showing the exponential decay of the probability that neighboring points would be separated by ensemble random projection trees when the ensemble size increases. Our theory can be used to refine the choice of random projections in the growth of trees, and experiments show that the effect is remarkable.Comment: 15 pages, 4 figures, 2018 IEEE Big Data Conferenc

    Characterization the regulation of herpesvirus miRNAs from the view of human protein interaction network

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>miRNAs are a class of non-coding RNA molecules that play crucial roles in the regulation of virus-host interactions. The ever-increasing data of known viral miRNAs and human protein interaction network (PIN) has made it possible to study the targeting characteristics of viral miRNAs in the context of these networks.</p> <p>Results</p> <p>We performed topological analysis to explore the targeting propensities of herpesvirus miRNAs from the view of human PIN and found that (1) herpesvirus miRNAs significantly target more hubs, moreover, compared with non-hubs (non-bottlenecks), hubs (bottlenecks) are targeted by much more virus miRNAs and virus types. (2) There are significant differences in the degree and betweenness centrality between common and specific targets, specifically we observed a significant positive correlation between virus types targeting these nodes and the proportion of hubs, and (3) K-core and ER analysis determined that common targets are closer to the global PIN center. Compared with random conditions, the giant connected component (GCC) and the density of the sub-network formed by common targets have significantly higher values, indicating the module characteristic of these targets.</p> <p>Conclusions</p> <p>Herpesvirus miRNAs preferentially target hubs and bottlenecks. There are significant differences between common and specific targets. Moreover, common targets are more intensely connected and occupy the central part of the network. These results will help unravel the complex mechanism of herpesvirus-host interactions and may provide insight into the development of novel anti-herpesvirus drugs.</p

    Adonis: Practical and Efficient Control Flow Recovery through OS-Level Traces

    Get PDF
    Control flow recovery is critical to promise the software quality, especially for large-scale software in production environment. However, the efficiency of most current control flow recovery techniques is compromised due to their runtime overheads along with deployment and development costs. To tackle this problem, we propose a novel solution, Adonis, which harnesses OS-level traces, such as dynamic library calls and system call traces, to efficiently and safely recover control flows in practice. Adonis operates in two steps: it first identifies the call-sites of trace entries, then it executes a pair-wise symbolic execution to recover valid execution paths. This technique has several advantages. First, Adonis does not require the insertion of any probes into existing applications, thereby minimizing runtime cost. Second, given that OS-level traces are hardware-independent, Adonis can be implemented across various hardware configurations without the need for hardware-specific engineering efforts, thus reducing deployment cost. Third, as Adonis is fully automated and does not depend on manually created logs, it circumvents additional development cost. We conducted an evaluation of Adonis on representative desktop applications and real-world IoT applications. Adonis can faithfully recover the control flow with 86.8% recall and 81.7% precision. Compared to the state-of-the-art log-based approach, Adonis can not only cover all the execution paths recovered, but also recover 74.9% of statements that cannot be covered. In addition, the runtime cost of Adonis is 18.3× lower than the instrument-based approach; the analysis time and storage cost (indicative of the deployment cost) of Adonis is 50× smaller and 443× smaller than the hardware-based approach, respectively. To facilitate future replication and extension of this work, we have made the code and data publicly available
    corecore